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Abstract 

Storage area networks, remote backup storage systems, and similar information systems frequently 
modify stored data with updates from new versions. In these systems, it is desirable for the data to not only 
be compressed but to also be easily modified during updates. A malleable coding scheme considers both 
compression efficiency and ease of alteration, promoting some form of reuse or recycling of codewords. 
Malleability cost is the difficulty of synchronizing compressed versions, and malleable codes are of 
particular interest when representing information and modifying the representation are both expensive. 
We examine the trade-off between compression efficiency and malleability cost measured with respect 
to the length of a reused prefix portion. The region of achievable rates and malleability is formulated as 
an information-theoretic optimization and a single-letter expression is provided. Relationships to coded 
side information and common information problems are also established. 
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I. Introduction 

Conventional data compression uses a small number of compressed-domain symbols but otherwise picks 
the symbols without care. This carelessness renders codewords utterly disposable; little can be salvaged 
when the source data changes even slightly. Such data compression is concerned only with reducing the 
length of coded representations. In this paper and a companion paper with a distinct formulation [1], we 
adopt the mantra of the green age, reduce, reuse, recycle. We formulate problems motivated not only by 
reduction of representation length but also by the reuse or recycling of compressed data when the source 
sequence to be coded changes. 

In Shannon's original formulation of asymptotically lossless block codingQ "the high probability group 
is coded in an arbitrary one-to-one way" into an index set of the appropriate size. This arbitrariness may 
seem to impede reuse, but it also suggests that many codes are equally good for compression, and one 
may choose amongst them to optimize a reuse criterion^ One may also allow suboptimal compression 
to improve reuse; this trade-off, under a specific model of reuse, is the focus of this paper. 

Moving toward formalizing, suppose that after compressing a random source sequence X^, it is 
modified to become a new source sequence according to a memoryless editing process Py\x- A 
malleable coding scheme preserves some portion of the codeword of and modifies the remainder 
into a new codeword from which may be decoded reliably using the same deterministic codebook. 

There are several possible notions of preserving a portion of the codeword of X". Here we concentrate 
on a malleability cost defined through the reuse of a fixed part of the old codeword in generating a 
codeword for Y"". We call this fixed segment reuse since a segment is cut from the codeword for X^ 
and reused as part of the codeword for Y"". Without loss of generality, the fixed portion can be taken to 
be the beginning of the codeword, so the new codeword is a fixed prefix followed by a new suffix. 

The fixed reuse formulation is suitable for applications where the update information (new suffix) must 
be transmitted through a rate-limited communication channel. If the locations of changed symbols were 
arbitrary, the locations would also need to be communicated, communication which may be prohibitively 
costly. This formulation is also suitable for information storage systems that use linked lists such as the 
FAT and NTFS systems. A contrasting scenario is for a cost to be incurred when a symbol is changed 
in value, regardless of its location. We studied this in [1]. 

'From [2] with emphasis added. 

^The arbitrariness of code mappings have also been exploited in redundancy-free methods for joint source channel coding [3] 
and for modulation [4], in a manner related to [1]. 
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Our main result is a characterization of achievable rates as a single-letter expression. To the best of our 
knowledge, this is among the first works connecting problems of information storage — communication 
across time — with problems in multiterminal information theory. We relate the fixed reuse problem to 
several previously-studied problems in multiterminal information theory, some of which are exploited 
in this work. In particular, a connection to the Gacs-Korner common information shows that a large 
malleability cost must be incurred if the rates for the two versions are required to be near entropy. 

The remainder of the paper is organized as follows. Motivations from engineering practice in areas 
such as database management and network information storage are given in Section JI] Section |lll] then 
provides a formal problem statement for malleable coding with fixed segment reuse. The region describing 
the trade-off between the rates for the original codeword, for the reused portion, and for the new codeword 
is the main object of study. Section ITlI-B I uses an implicit Markov property to simplify the analysis of the 
rate-malleability region and Section IIII-CI describes two easily achieved points. Using a random coding 
argument. Theorem [T] in Section |IV] gives an achievable rate-malleability region in terms of an auxiliary 
random variable. There is also a matching converse. Section ITV-B I looks at the auxiliary random variable 
in detail; Theorem |2] is a partial characterization of the unknown auxiliary random variable when there is a 
sufficient statistic for the new version based on the old version. Section |V] connects this malleable coding 
problem to other problems in multiterminal information theory. Section |Vl] closes the paper, drawing 
comparison to the problem of designing side information. 

II. Background 

Our study of malleable coding is motivated by information systems that store frequently-updated 
documents. In such systems, storage costs include not only the average length of the coded signal, but 
also costs in updating. We describe these systems and some of their applications. 

In information technology infrastructures, there is often a separation between computer hosts used 
to process information and storage devices used to store information. Storage area networks (SAN) 
and network-attached storage (NAS) are two technologies that transfer data between hosts and storage 
elements. SAN and NAS systems comprise a communication infrastructure for physical connections 
and a management infrastructure for organizing connections, storage elements, and computers for robust 
and efficient data transfers [5], [6]. Grid computing and distributed storage systems also display similar 
distributed caching [7], [8]. Even within single computers, updating caches within the memory hierarchy 
involves data transfers among levels [9]. 

Data may be dynamic, being updated or edited after some time. Separate data streams may be generated. 
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Fig. 1. Distributed database access. 

but the contents may differ only slightly [10]-[13]. Moreover, old versions of the stream need not be 
preserved. Examples include the storage of a computer file backup system after a day's work or graded 
homework in distance learning [14]. Correlations among versions differentiates malleable coding from 
write-efficient memories [15], where messages are assumed independent; see [1] for further contrasts. 

Storage of communication transcripts in email hosting services such as GMail provides another area 
where different versions of snippets of text are stored in one common access point. The problem there is 
made more interesting by the presence of a large number of users who have created different modifications 
of original shared sources. We do not deal with such problems explicitly. 

Systems such as SAN and NAS have complicated interplays between storage and transmission. Current 
technological trends in transmission and storage technologies show that transmission capacity has grown 
more slowly than disk storage capacity [7]. Hence "new" representation symbols may be more expensive 
than "old" representation symbols, suggesting that reuse may be more economical than reduce. 

Recent advances in biotechnology have demonstrated storage of artificial messages in the DNA of 
living organisms [16]; such systems provide another motivating application. Certain biotechnical editing 
costs correspond to the malleability costs defined for fixed reuse, as detailed in [1]. 

Here we describe several scenarios where malleable coding is applicable. Consider the topology given 
in Fig. [T] The first user has stored a codeword A for document X in database 1. Now the second user, 
who has a copy of X, modifies it to create Y . The second user wants to save the new version to the 
information system, but since the users are separated, database 2 rather than database 1 serves this user. 
Transmission costs for different links may be different. The natural problem is to minimize the total cost 
needed to create a codeword B at database 2 that losslessly represents Y . 

Consider two users who both have access to a distributed database system that stores several copies 
of the first user's document on different media at different locations. Due to proximity considerations. 
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the users will access the document from different physical stores. Suppose that the first user downloads 
and edits her document and then wishes to send the new version to the second user. There are different 
ways to accomplish this. The first user can send the entire new version to the second user or the second 
user can download the old version from his local store and require that the first user only send the 
modification. In the former scheme, the cost of transmission is borne entirely by the link between the 
users, rendering distributed storage pointless. In the latter scheme, there is a trade-off between the rate 
at which the second user downloads the original version from the database system and the rate at which 
the first user communicates the modification. 

Even in a single user scenario, there may be similar considerations. The first user may simply wish 
to update the storage device with her edited version. The goal would be to avoid having to create an 
entirely new version of the stored codeword by taking advantage of the availability of the stored original 
in the database. 

III. Problem Statement and Simplification 

We are now ready to give the formal problem statement. Following the formal problem statement, we 
deduce simplifications to the problem statement and quickly find two achievable points. 

A. Formal Problem Statement 

Let {{Xi^Yi)}'^-^ be a sequence of independent drawings of a pair of jointly-distributed random 
variables {X,Y), X eW,Y eW, where W is a finite set and px,Y{x,y) = Pr[X = x,Y = y]. The 
marginal distributions are 

Px{x) = ^ p{x,y) 
yew 



and 



and the conditional distribution 



PY{y) = X] pi^^y)' 



VY\x{y\x) 



Vxix) 

describes a modification channel. When the random variable is clear from context, we write pxix) as 
■p{x) and so on. 

Denote the storage medium alphabet by V, which is also a finite set. It is natural to measure all rates in 
numbers of symbols from V. This is analogous to using base-|V| logarithms in place of base-2 logarithms, 
and all logarithms should be interpreted as such. 



September 4, 2008 



DRAFT 



5 



yr G mmmmm i 

• ' ^ e V"^ 

jjnj g ynj 

Fig. 2. In malleable coding witli fixed segment reuse, the compressed representations of X" and have the first nJ storage 
symbols in common. 

Our interest is in coding of X" followed by coding of where the first nJ letters of the codes 
are (asymptotically almost surely) in common. We show this in Fig. |2j where G V"^ is the 

representation of , Bf^ e V"^ is the representation of y", and U^^ G V"'^ is the common initial 
symbols. We thus define the encoding and decoding mappings as follows. 

An encoder for X with parameters (n, J, K) is the concatenation of two mappings: 

•> 11/ •> H/ •'lit ' 

where 
and 

An encoder for Y with parameters (n, J, L) is defined as: 
where we use one of the previous encoders /^^^^ together with 
Given these encoders, a common decoder with parameter n is 

The encoders and decoder define a block code for fixed reuse malleability. Although not strictly required, a 
common decoder is a convenient way of expressing the requirement of a common deterministic codebook. 
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A trio (/g \ Id) with parameters {n^ J^K^L) is applied as follows. Let 
be the source code for X", where the first part of the code is explicitly notated as 

Then the encoding of is carried out as 

B'^L = f^P{Y^,Uf). 



We also let 



We define the error rate 



where 



and 



(xr,yn = (/B(A5^^),/B(i?r^)). 

A = max(Ax, Ay), 
Ax = Pr[Xr ^ 
Ay = Vi[Y^ ^ Y^], 



and we define the disagreement rate as 

Ac/ = Vi[Af ^ Bf]. 

The fact that there is a disagreement rate rather than requiring the first nJ symbols to always be equal 
introduces the usual slack associated with Shannon reliability. (We will require /S.jj to be arbitrarily small, 
so the possibility of A^/ 7^ is ignored in Fig. |2]) 

We use conventional performance criteria for the code, which are the numbers of storage-medium 
letters per source letter 



1 

n 

and 



= -loKiVilVr^ 



L = -log|v||vr^ 

and add as the third performance criterion the normalized length of the portion of the code which does 
not overlap 

M = L- J = - logivi \V\<^-^\ 

n ' ' 
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Fig. 3. Block diagram of malleable coding with fixed segment reuse. 



We call M the malleability rate. 

Definition 1: Given a source p{X, Y), a triple {Kq, Lq, Mq) is said to be achievable if, for arbitrary e > 
0, there exists (for n sufficiently large) a block code for fixed reuse with error rate A < e, disagreement 
rate Au < e, and lengths K < Kq + e, L < Lq + e, and M < Mq + e. 

We want to determine the set of achievable rate triples, denoted as It follows from the definition 
that is a closed subset of and has the property that if {Kq, Lq, Mq) € then {Kq + 60, Lq + 
61, Mq + 62) ^ M for any 6i > 0, i = 0,1,2. The rate region Ai is thus completely defined by its lower 
boundary, which is itself closed. 

Rather than using {K,L,M), the triple {J,K,L) may be used to characterize the achievable region. 
Equivalently, we can use {Rq, Ri, R2) in place of {K,L,M) as shown in Fig. [3] Using this notation is 
more consistent with established work in multiterminal information theory. The relation is: 

1) J = Rq, 

2) K = Rq + Ri, and 

3) L = Rq + i?2- 

B. Problem Simplification 

A priori, it seems there are two approaches to trading off storage rate for malleability rate in the fixed 
reuse problem: expending K greater than H{X) might allow a better side information U to be formed; 
and expending L greater than H{Y) might allow greater flexibility in the design of U. It turns out that 
expanding the representation of X^ provides no advantage, i.e., any extra bits used to encode X will not 
help in the representation for Y. This is due to the Markov relation U ^ X ^ Y that holds due to the 
ordering of the encoding procedure. 

For the remainder of this paper, we focus on expending L greater than H{Y) and analyze the achievable 
rate-malleability. We focus on how L depends on the size of the portion to be reused J, thus establishing 
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the malleability M. When proceeding in this regime, two constraints are imposed: 

1) H{U) = J, and 

2) H{U,X) = H{X). 

The second constraint states that [/ is a subrandom variable of X, which is implicit in the formal problem 
statement in Section |lll] and the block diagram, Fig. [3] 

Rather than characterizing the entire region of achievable triplets M., we consider fixing J and finding 
the best L (thus fixing M = L — J). We want to characterize the achievable rates L as a function of J. 
The smallest such L is denoted L*{J). 

C. Two Achievable Points 

It is easy to note the values of the corner points corresponding to J = and J = H{X). For J = 0, 
the lossless source coding theorem yields L*(0) = H{Y). For J = H{X), since the lossless compression 
of X" has to be preserved, we will need L*{H{X)) = H{X,Y). This follows from noting that since 
the first H{X) symbols have to be fixed, we need to be able to losslessly represent the conditionally 
typical set, which requires H{Y\X) additional symbols, for a total of H{X) + H{Y\X) = H{X,Y). 
Since H{Y\X) < H(Y), this is better than discarding the old codeword of X" and creating an entirely 
new codeword for Y""; unless X and Y are independent, this is strictly better. 

IV. Main Results 

We cast the fixed reuse malleable coding problem as a single letter information theoretic optimiza- 
tion, providing matching achievability and converse statements. Unfortunately, this is not computable in 
general. Later we will give a computable partial characterization for cases where there exists a sufficient 
statistic for the estimation of the new version of the source from the reused part of the compressed old 
version. The basic concepts are also applicable to a lossy formulation with Gaussian sources. 

The achievability proof for the boundary of the fixed reuse rate region uses definitions and properties 
of strongly typical sets (Lemmas [T}©, given in Appendix [A] 

A. The Fixed Reuse Malleability Region 

We consider the trade-off between L and J. From the previous section, it is clear that for a given 
malleability, the compression efficiency of Y{^ is determined by the quality of the binning assignment 
for the typical strings of Xf. We capture this assignment by a (probabilistic) function p{U\X). Then, 
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we can formulate the following information theoretic optimization problem: 

L*(J) - J = min HiY\U) (1) 
p{U\X) 

s.t. H{U) + H{X\U) = H{X), 
H{U) = J. 

Theorem 1: The optimization problem ^ provides a boundary to the rate region 7^ — {^Rq^R\,R2) 
when K = Ro + Ri= H{X). 

Proof: Achievability: The constraints require H{U) = J and that there is a Markov condition 
U ^ X ^ y. Codebooks for X" and Y"" are randomly generated according to p{x) and p{y). These 
codebooks are of size |V|"^ = |V|"^("'^) and |V|"^ respectively. Each codebook is partitioned into 
|y|7iH({/) ^jjjg ^jjj^ ^ corresponding bin label U^"^ . Since t/""' is a function of X", it may be written as 
U'^^X'l). Clearly, we can choose H{U) = J and use J symbols to assign the bin label C/""^. For the 
Xf codebook, H{X) — J symbols are used to assign labels to members of each bin; the intra-bin label 
is denoted Ix- Similarly for the codebook, L — J symbols are used to assign labels to members of 
each bin; the intra-bin label is denoted /y. 

The encoder for Xf , Z^'' = /^^^ x f^^\ operates by generating a label A^^ = [C/f Ix] according 
to which x" is realized. The encoder for y", /^^^ = /^^-^ x generates the same bin label [/"•^ 

and also generates the intra-bin label ly, based on which is realized; the resulting encoding is 
B'l^ = [C/f"',/y]. Since both encoders use the identical bin label , it is clear that the disagreement 
rate A[/ can be made arbitrarily small. 

The common decoder fr, operates according to strong typicality in the usual way. 

By the direct part of Shannon's source coding theorem (see Lemma [T|) and the splitting possible due to 
the entropy chain rule [17], it follows that Ax = Pr[X" ^ f£i{A^^)] is arbitrarily small with increasing 
block length. 

Now consider recovering yf from the codeword Sf^ = [[/f-^,/y], which uses the same prefix 
but different suffix. The encoder had found the index ly such that {U'f{X'^),Y{') G T^ui-' Y]5- ^he 
probability of successful encoding is determined by two error events. The first is that ([/""', y") does 
not belong to the typical set; the second is that C/""^ is jointly typical with X" but not with Y^. The 
first event has arbitrarily small probability of error by the joint AEP, Lemma |2] The second event has 
arbitrarily small probability of error by applying Lemma |4] to the U X ^ Y Markov chain. 
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Decoding error happens when there is another typical ^ Y"" that is jointly typical with C/""^. The 
probability goes to zero almost surely when L — J > H{Y\U) by an AEP argument [18, (14.278)]. 

Thus Ay and also A may be made arbitrarily small, as required for achievability. 

Converse: The converse for the encoding and decoding of via as a tree-based label 

follows directly from the converse to Shannon's source coding theorem. 

We focus on the encoding of Y"" onto [ly] and the decoding of from [f/""^, ly]- By the encoding 
strategy, [/ is a function of X^. We then have a chain of inequalities: 

(a) 

n{L -J) = ni?2 > H{Iy ) 

(b) J 

> H{Iy\Uf) 

= I{Y^; /y IC/f-^) + H{Iy\Y^, Uf ) 
= liY^'-JylUf) 

= HiYl'lUf ) - H{Y^\Iy, Uf) 

(d) 

> H{Y{'\U^-') -ne 
= nH{Y\U) - ne. 

Step (a) follows from dimensionality considerations; step (b) from noting that conditioning can only 
decrease entropy; step (c) from the fact that Y"" and C/f determine ly; step (d) by applying Fano's 
inequality; and step (e) from the chain rule of entropy and independence in time. Thus we have obtained 
the desired inequality. ■ 



B. Further Characterizations 

As in the source coding with side information problem [19]-[21] and several other problems in 
multiterminal information theory. Theorem [T] left us to optimize an auxiliary random variable U that 
describes the method of partitioning. Here we will provide simple bounds on L*{J) and then further 
characterization in terms of a sufficient statistic of X for Y. 

Theorem \T\ demonstrated that we require 

L{J) > H{Y\U) + J. 

The easily achieved corner points discussed previously and a few simple bounds are shown in Fig. ID 
The bounds, marked by dotted lines, are as follows: 
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Fig. 4. Characterizations of the fixed reuse malleability region boundary L*{J). Each {> is a point determined in Section IlII-CI 
and the dotted lines are simple bounds from Section HV-BI With W defined as a minimal sufficient statistic of X for Y, the 
solid line shows the unit-slope boundary determined by Theorem |2] The dashed line represents a portion of boundary that is 
unknown (but known to be convex by Theorem |3j. 



(a) The lossless source coding theorem applied to Y alone gives L*{J) > H{Y). 

(b) Another trivial lower bound from the construction is L*{J) > J. 

(c) Since one could encode Y"" without trying to take advantage of the J symbols already available, 

L*{J) <J + H{Y). 

In evaluating the properties of L*{J) further, let W he. & minimal sufficient statistic of X for Y. 
Intuitively, if J is large enough that one can encode W in the shared segment it is efficient to do 
so. Thus we obtain regimes based on whether J is larger than H{W). 

For the regime of J > H{W), the boundary of the region is linear by the following theorem: 
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Theorem 2: Consider the problem of ([T]). Let be a minimal sufficient statistic of X for Y. For 
J > H{W), the solution is given by: 

L*{J)- J = H{Y\W). (2) 

Proof: By definition, a sufficient statistic contains all information in X about Y . Therefore any rate 
beyond the rate required to transmit the sufficient statistic is not useful. Beyond H{W), the solution is 
linear. ■ 
A rearrangement of ([2]) is 

L*{J) = H{Y, W) + [J- H{W)]. 

This is used to draw the portion of the boundary determined by Theorem |2] with a solid line in Fig. |4l 
For the regime of J < H{W), we have not determined the boundary but we can show that L*{J) is 
convex. 

Theorem 3: Consider the problem of ([T]). Let W he a minimal sufficient statistic of X for Y. For 
J < H{W), the solution L*{J) is convex. 

Proof: Follows from the convexity of conditional entropy, by mixing possible distributions U. ■ 

The convexity from Theorem [3] and the unit slope of L*{J) for J > H{W) from Theorem |2] yield the 
following theorem by contradiction. An alternative proof is given in Appendix IB] 

Theorem 4: The slope of L*{J) is bounded below and above: 

< jjL*{J) < I. 

The following can be seen as extremal cases for the theorem: when X and Y are independent, L*{J) = 
J + H{Y) and so -^L*{J) = 1. When X = Y, L*{J) = H(Y) for any J, and so -^L*{J) = 0. 

Without regard to the constraint on J, it is known that the sufficient statistic for Y upon the observation 
X = X is, p(Y\X = x). Therefore for the regime where J > H{p{Y\X = x)), this is the best knowledge 
of Y we can endow to the decoder for decoding Y. 

The challenge lies when J < H{p{Y\X = x)): this is an estimation problem with limited communi- 
cation budget. In a lossy setting, for the special case of jointly Gaussian X and Y this problem may be 
entirely solved by casting it as a linear least-squares estimation problem. 

In fact, ([T]) can be stated as follows: 

max H{Y\HU{X))) (3) 
s.t. H{f{X)) = m 
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It is clear that ([T]) and Q are equivalent. In this problem the design of the label is cast as the problem 
of designing a sufficient statistic of Y given X, consistent with our previous discussion. The fact that in 
this statement U equals f{X) ensures that Z7 is a subrandom variable of X. 

V. Connections 

An alternate method of further analyzing the rate-malleability region for fixed segment reuse is to 
make connections with solved problems in the literature. Here we connect our problem and the lossless 
source coding with coded side information problem [19]-[21]. Source coding with coded side information 
problems provide achievable rate regions for fixed reuse malleability. We also discuss relations to a 
common information problem [22]. If K = H{X) and L = H{Y) are required, then the length of 
the common portion of the source code is less than or equal to C{X;Y), the Gacs-Korner common 
information. 



A. Relation to the Coded Side Information Problem 

In this section, we show that rate regions for the coded side information problem (also called the helper 
problem) are achievable rate regions for the malleability problem. Results are expressed in terms of the 
rate triple TZ rather than the rate-malleabiUty triple M.. 

Definition 2: Let 



7^l 



helpj^ 



{Ro,Ri,R2): Ro > H{U) 

Ro + Ri > H{X) 
Ra > H{Y\U) 



where U is any auxiliary random variable. 

Theorem 5: The rate region for the coded side information problem T^heip^ is an achievable rate region 
for the fixed reuse malleability problem, i.e. T^-heip^ ^ Ti- 

Proof: The result follows simply by noting that the malleability problem has a more extensive 
information pattern than the coded side information problem (see Fig. [5]l and by the achievability result 
for the coded side information problem [20, Theorem 2.1]. Wyner's rate region in the case where the 
side information need not be compressed satisfies Rq > H{U), Ri > H{X\U), and R2 > H{Y\U), 
which implies T^heip^ ■ ■ 

For the malleable coding problem, the auxiliary random variable U may be generated from X and 
will be given to the encoder for Y. Lossless source coding is always successively refinable [17], but it 
is unclear how to split off some of the information from X into U. 
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Fig. 5. The fixed reuse malleable coding problem (left, Fig. [3]( has a more extensive information pattern than the coded side 
information problem (right). For fixed reuse, the side information may be designed from X and this side information is available 
at the encoder for Y. 
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Fig. 6. The fixed reuse malleable coding problem (left) has a more extensive information pattern than the coded side information 
problem (right). For fixed reuse, the coded side information is available at the encoder for Y. 



In the result just given, the side information was not compressed and so the rate region was actually a 
Slepian-Wolf region [23] rather than a true coded side information rate region, even though the coded side 
information theorem was invoked in the proof. An alternative comparison leads to the side information 
actually being compressed. In particular, consider the coded side information problem where X is side 
information to be compressed, and Y is the source to be compressed. There is a decoder that takes these 
two things and tries to reproduce Y. This describes only the lower branch of the fixed reuse system. The 
upper branch would produce a code to allow lossless reconstruction of X at total rate Ro + Ri ^ H{X). 

We focus on the lower branch, studying the trade-off between Rq and R2. This is equivalent to looking 
at L*{J), as in previous sections. In order to cast an equivalence to the coded side information problem, 
assume that the side information code is not available to the Y encoder. Since the malleable coding 
problem has a more extensive information pattern, this implies that the derived rate region will be an 
achievable region. The lower branch as described, is now exactly the coded side information problem 
[19], [20]. 

Definition 3: Let 

' {Ro,R2) : Ro > H{Y\U) 

R2 > i{x-u) 

where U is any auxiliary random variable that satisfies the Markov condition U ^ X ^ Y . 
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Theorem 6: The rate region for the coded side information problem T^heipa is an achievable rate region 
for the lower branch of the malleable coding with fixed reuse problem, i.e. T^heipj Q proj(iJo,fl2)^- 

Proof: The result follows simply by noting that the malleable coding problem has a more extensive 
information pattern than the coded side information problem (see Fig. O and by the achievability result 
for the coded side information problem [19, Theorem 2]. ■ 

Since we are interested in the lower boundary of the rate region, finding T^heip^ may be reduced to 
optimizing the auxiliary random variable U for the coded side information problem, which is also the 
reused segment of the source code for the malleable coding problem. This is usually difficult, but see 
[21], [24]. The optimization problem for Rq as a function of R2 is 

F{R2) = min H(Y\U) (4) 

p{U\X) 

s.t. I{U;X) < R2. 

Interestingly, a problem in machine learning called the information bottleneck problem formulates a 
similar optimization function and provides an alternative operational interpretation of T^heip, [25], [26]]^ 
The optimization problem is 

B{R2)= max I{Y; U) (5) 

p{U\X) 

s.t. I{U;X) < R2, 

which clearly satisfies F(i?2) = H{Y) — B{R2), since H{Y) is not open to optimization [26]. 

One can notice that the optimization problem ([T]) is closely related to the optimization problems that 
arise for the coded side information problem and the information bottleneck problem. In particular, it 
can be noted that the constraint is a subset of the constraint for the coded side information problems. 
Since I{U;X) = H{U) - H{U\X), it follows that {p{U\X) : I{U;X) < Rq} D {p{U\X) : H{U) < 
Ro and HiU\X) = 0}. 

B. Relation to Gdcs-Korner Common Information 

We have found that rate regions for lossless coding with coded side information are achievable for 
malleable coding, however computing these regions involves optimizing auxiliary random variables. It 
turns out that for particular ranges of rates, the rate region is actually known in closed form [21]; the 

^New developments in computing the rate region for the coded side information problem [21], [24] also have implications 
for computing the information bottleneck function [25], [26], though these do not appear to have been exploited. 
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range is partially delimited by the common information functional of Gacs and Korner [22], [27, pp. 
402-404]. The Gacs-Korner common information also yields a characterization of malleable coding with 
fixed segment reuse. 

Definition 4: For random variables X and Y, let U = f{X) = g{Y) where / is a function of X and 
g is a. function of Y such that f{X) = g{Y) almost surely and the number of values taken by / (or 
g) with positive probability is the largest possible. Then the Gacs-Korner common information, denoted 

C{X;Y), is H{U). 

Definition 5: The joint distribution p{x, y) is indecomposable if there are no functions / and g each 
with respect to the domain W so that 
. Pr[/(X) = g{Y)] = 1, and 

• f{X) takes at least two values with non-zero probability. 
It can be shown that C(X; Y) = Q ii X and Y have an indecomposable joint distribution. Further 
properties of indecomposable joint distributions are given in [27, p. 350] and [21]. In particular, an 
auxiliary random variable U that satisfies the Markov relation U ^ X ^Y is, used for characterization. 

Gacs and Komer show that the maximal length of the common beginning portion of entropy-achieving 
source codes for X and for Y, the operational definition of common information, coincides with the 
informational definition of common information. The basic result, [22, Theorem 1], is that it is not 
possible in general to code two sources so that the resulting codes have some common fixed length of 
order n. This is because in general, p{x, y) is indecomposable and so the common information is zero. 
Such a negative result also carries over to the fixed reuse problem. 

Consider the block diagram for the coding problem that involves the common information in its solution 
[27, Fig. P.28 on p. 403], Fig. El If it is required that Ri = H{X)-Rq and that R2 = H{Y)-Rq, then the 
largest possible Rq is C{X]Y). Since entropy is being achieved, it follows that R2 = H{Y\U) through 
Slepian-Wolf or conditional entropy means at [17]. Since the distributed system does as well as 

i(Y) 

a centralized system, even if U is given to , this will not improve things. In particular, the system 
shown in Fig. [8] will have the same relationship to the common information. Showing this rigorously 
involves modifying the converse of the common information proof and seeing that the arguments follow 
through. Now one can observe that this block diagram is an enhanced version of the fixed reuse malleable 
coding block diagram, redrawn as Fig. |9] 

Theorem 7: The Gacs-Korner common information rate triple provides a partial converse to the rate- 
malleability triple. 
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Fig. 7. Block diagram for the Gacs-Komer common information problem. 
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Fig. 8. Block diagram for tlie Gacs-Komer common information problem when giving U to . This additional information 
does not help in coding. 



Proof: The result follows from the fact that the common information problem has a more extensive 
information pattern than the fixed reuse malleable coding problem (see Fig. |9ll and the converse for the 
enhanced common information problem [22]. ■ 
This theorem gives an outer bound to go with the achievable region defined in Definition [2] Thus for 
the malleable coding problem, if we want K = H{X) and L = H(Y), then M must be bad: M > 
H{Y) — C{X; Y), where C{X; Y) is often zero. Since there is almost no overlap possible when requiring 
L = H(Y), allowing larger L in Section UlI-BI was a good approach. 

VI. Discussions and Closing Remarks 

Phrased in the language of waste avoidance and resource recovery: classical Shannon theory shows 
how to optimally reduce; we have here studied reuse and in [1] studied recycling and have found these 
goals to be fundamentally in tension. 

We have formulated an information-theoretic problem motivated by the transmission of data to up- 
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Fig. 9. Block diagram for malleable coding with fixed segment reuse. This has a reduced information pattern as compared to 
the Gacs-Korner common information problem when giving U to . 



date the compressed version of a document after it has been edited. Any technique akin to optimally 
compressing the difference between the documents would require the receiver to uncompress, apply the 
changes, and recompress. We instead require reuse of a fixed portion of the compressed version of the 
original document; this segment cut from the compressed version of the original document is pasted 
into the compressed version of the new document. This requirement creates a trade-off between the 
amount of reuse and the efficiency in compressing the new document. Theorem [J provides a complete 
characterization as a single-letter information-theoretic optimization. 

We established relationships to several previously-studied multiterminal information theory problems. 
Perhaps the most interesting is with the Gacs-Korner common information problem. Through that 
relationship one can see that if the original and modified sources have an indecomposable joint distribution 
and are required to be coded close to their entropies, then the reused fraction must asymptotically be 
negligible. We also showed through a Markovianity argument that there is no benefit from coding the 
original source above its entropy. Our focus was therefore on cases where the modified source is encoded 
with excess rate. 

A. On the Effectiveness of Binning 

We informally describe the ineffectiveness of independent, uniform binning. Place the codewords of 
Tj'^j^ that have the same first nJ symbols into the same bin. There are |V|""' bins, each of which 
has jVl"'^^'^"''-)"'^) elements. Let the bins be labeled by f/"'^ = 1, . . . , |V|""'. For each of the bins u^"^ 
containing some sequences of x", create a corresponding bin to contain the conditionally typical sequences 
y", given that G n""'. This gives the smallest sized bins for given that the first nJ symbols of 
the representation of are the same as the first nJ symbols used to represent y". It is clear that the 
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representation of is not unique, as the same y" may be represented in more than one bin. 

For each x" € ^[x]^ there are about |V|"^*^^I"''-^ conditionally typical members of ^[yixj^C^i) 
Lemma [3l Through the union bound (Boole's inequality) we obtain: 



note that there are |V|" such bins . Although this may suggest that the compression of Y"" may 
require up to nH{X,Y) = n{H{X) + H{Y\X)) regardless of the value of J, this is not the case. 

The union bound is tight if and only if it consists of independent events, but it is difficult to examine 
the tightness or to find a tighter bound. One might believe that the union bound is tight for any J > 0, 
implying a rate requirement of H{X) + H{Y\X) = H{X,Y) symbols for the compression of to 
have any nontrivial malleability. With the upper bound of Section IIV-BI we have shown that this beUef 
is false. Thus the union bound is not tight, and independent, uniform binning [23] fails. 

B. Designing Side Information 

Even after characterization by a coding theorem, rate regions in multiterminal information theory 
are notoriously difficult to examine because of optimizations involving auxiliary random variables. For 
several source coding problems with coded side information, achievable rates are characterized by product- 
space characterizations with implicit optimizations over infinite-letter mappings. One can think of these 
optimizations as problems of designing useful side information. For malleable coding problems, the 
design of side information takes central importance. 

For the Slepian-Wolf problem [23], side information formed through random binning is good. For point- 
to-point problems, (side) information formed through quantization binning is good. For other problems, 
however, there is no intuition about optimal auxiliary random variables and the nature of good binning. 
Recent work on the source coding with coded side information problem [19], [20] provides some insight 
into regimes where side information generated through codes like random-binning works and where it 
does not [21], however there is no general theory. 

One fundamental difference between coding with side information problems and the malleable coding 
problem is the time ordering of when codes are applied. Here, the first source is compressed and then the 
second source is compressed with access to a portion of the actual realization of the compressed version 
of the first source, not just a statistical description. 




^n{H(X)-J)Y^;]nH(Y\X).^ 
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Appendix A 
Strong Typicality 



Definition 6: The strongly typical set T^x]S ^^^^ respect to p{x) is 



■[X\S 



n 



■p{x] 



<S}, 



where N{x;xi) is the number of occurrences of x in and 6 > 0. 

Definition 7: The strongly jointly typical set T^yjs with respect to p{x, y) is 

N{x,y;x'{,y'^ 



n 



-p{x,y) 



< 5 



I x,y 
Definition 8: For any x^ G T^-^g, define a strongly conditionally typical set 

TCrmsi^i) = {yi e Tj-], I {x^,y^) e Jf^y^s} • 

Now that we have definitions of typical sets, we put forth some lemmas. 
Lemma 1 (Strong AEP): Let ?7 be a small positive number such that rj —>■ as 6 
sufficiently large n, 

Proof: See [28, Theorem 5.2]. 
Lemma 2 (Strong JAEP): Let A be a small positive number such that A ^ as 5 
sufficiently large n, 

Pr[(X-,Fi")GT[l^],]>l-5 

and 



0. Then for 



0. Then for 



(1_(^)|V|"W^)-^) < 



- IXY]S 



< |y|n(i/(X,y)+A)_ 



Proofi See [28, Theorem 5.8]. 



Lemma 3: If 



^[y|x]5(^i) 



> 1, then 

where u ^ as n ^ oc and 6^0. 
Proof: See [28, Theorem 5.9]. 
Lemma 4 (Berger's Markov Lemma): Let {X,Y,Z) form a Markov chain X ^ Y 
sufficiently large n. 



Z. Then for 
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for any 5 > and any realization . 

Proof: See [29, Lemma 4.1]. ■ 

Appendix B 
Alternate Proof of Theorem H] 

Proof of upper bound: Let Ji > J2 be any two values of J. Let Vi and V2 be the corresponding 
auxiliary random variables U that solve the optimization problem ([U. Then by the successive refinability 
of lossless coding [17], it follows that Vi and V2 will satisfy the Markov chain V2 '^Vi ^ X '^Y . 

By the data processing inequality, 

I{Y-V2)<I{Y-Vi) 
H{Vi\Y) - H{V2\Y) < H{Vi) - H{V2). 

By definition, 

L*{Ji) - L*{J2) = H{Y\Vi) + H{Vi) - H{Y\V2) - H{V2) 
= H{Vi\Y) - H{V2\Y). 

Therefore, 

L*(Ji) - L*{J2) < H{Vi) - H{V2) = Ji-J2 

which implies 

L*iJ,)-L*iJ2) 
J1-J2 

Proof of lower bound: We want to show that H{Vi\Y) — H{V2\Y) > 0. This property may be verified 
using Yeung's ITIP [28] after invoking the Markov chain V2 Vi X ^ Y and the subrandomness 
conditions, H{Vi\X) = H{V2\X) = 0. 
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